Search CORE

5 research outputs found

Learning video embedding space with Natural Language Supervision

Author: Bamotra Abhishek
Joshi Vaidehi
Priya Shriti
Uppala Phani Krishna
Publication venue
Publication date: 07/04/2023
Field of study

The recent success of the CLIP model has shown its potential to be applied to a wide range of vision and language tasks. However this only establishes embedding space relationship of language to images, not to the video domain. In this paper, we propose a novel approach to map video embedding space to natural langugage. We propose a two-stage approach that first extracts visual features from each frame of a video using a pre-trained CNN, and then uses the CLIP model to encode the visual features for the video domain, along with the corresponding text descriptions. We evaluate our method on two benchmark datasets, UCF101 and HMDB51, and achieve state-of-the-art performance on both tasks

arXiv.org e-Print Archive

AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation

Author: Babu R Venkatesh
Kundu Jogendra Nath
Pahuja Anuj
Uppala Phani Krishna
Publication venue: IEEE
Publication date: 01/06/2018
Field of study

Supervised deep learning methods have shown promising results for the task of monocular depth estimation; but acquiring ground truth is costly, and prone to noise as well as inaccuracies. While synthetic datasets have been used to circumvent above problems, the resultant models do not generalize well to natural scenes due to the inherent domain shift. Recent adversarial approaches for domain adaption have performed well in mitigating the differences between the source and target domains. But these methods are mostly limited to a classification setup and do not scale well for fully-convolutional architectures. In this work, we propose AdaDepth -an unsupervised domain adaptation strategy for the pixel-wise regression task of monocular depth estimation. The proposed approach is devoid of above limitations through a) adversarial learning and b) explicit imposition of content consistency on the adapted target representation. Our unsupervised approach performs competitively with other established approaches on depth estimation tasks and achieves state-of-the-art results in a semisupervised setting

Crossref

Open Access Repository of IISc Research Publications